Topics covered

  • Calculating correlations between asset classes
  • Heatmap visualisation using seaborn
  • Network analysis and visualisation using NetworkX
  • Minimum spanning trees
  • Interactive visualisations using Plotly

Introduction

Network analysis is becoming an ever increasingly popular method to analyse and visualise complex relationships in data in an intuitive way.

One interesting application of network analytics is visualising relationships between asset classes in the finance industry. We are always told that equities and bonds tend to move in opposite directions and certain assets such as gold and the Japanese Yen are 'safe-haven' assets and therefore behave in similar ways. But what does that actually look like and are there any other interesting relationships between asset classes? Networks can be a great way to convey this information at a high level and help understand broad market dynamics.

Information on the correlations between asset classes can be particularly useful for investors wanting to diversify their portfolio. Most people understand that portfolio risk can be mitigated by diversification, however, this may not always be the case. For instance if you 'diversified' your portfolio by investing in two different stocks which are strongly correlated with each other (i.e. if stock A goes up 2% then stock B also goes up 2%) then the overall risk of the portfolio has not been reduced as if there is a negative factor affecting stock A then it is also likely to affect stock B and both prices will trend downwards. Therefore, diversification can only be realistically achieved by investing in assets which are uncorrelated with each other. Good explanations of this phenomenon can be found at Quantdare and the Investors Chronicle.

In this post I will explain how we can visualise asset correlations as an interactive network diagram which allows the user to gain a high level overview of the relationships between asset classes. From this information they can identify areas of risk but also opportunities for diversification of their portfolio. We will cover the basics of how to calculate correlations between asset price returns using numpy and pandas, using NetworkX to create a network graph and finally use Plotly to display the network as an interactive visualisation.

Library imports

  • numpy and pandas to read and manipulate the raw data
  • networkx to generate the network representation of the data and create some initial graph visualisations
  • matplotlib and seaborn for general visualisations and plot formatting
  • Plotly to create the final interactive visualisation
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import seaborn as sns
from plotly.offline import init_notebook_mode

# plotly offline mode
init_notebook_mode(connected=True)

Load data

For this analysis we will use a dataset containing the daily adjusted closing prices of 39 major ETFs which represent different asset classes covering equities, bonds, currencies and commodities. The data covers a period of 4 years between November 2013 and November 2017.

raw_asset_prices_df = pd.read_csv("data/daily_asset_prices.csv", index_col="Date")

# show first five rows
raw_asset_prices_df.head()
Bonds Global Commodities DOW Emerg Markets EAFE Emerg Markets Bonds Pacifix ex Japan Germany Italy Japan ... Europe Pacific VXX Materials Energy Finance Tech Utilities ST Corp Bond CHF
Date
2017-11-08 81.83 16.40 235.46 46.78 69.87 114.60 47.69 33.18 30.95 60.02 ... 58.20 72.77 33.53 58.70 69.82 26.25 64.01 55.70 104.96 94.5100
2017-11-07 81.89 16.43 235.42 46.56 69.64 114.65 47.22 33.07 31.09 59.65 ... 58.17 72.20 33.52 58.64 70.16 26.38 63.66 55.66 105.01 94.5400
2017-11-06 81.86 16.53 235.41 46.86 69.90 115.26 47.20 33.34 31.22 59.18 ... 58.67 71.98 33.34 58.58 70.25 26.75 63.63 55.00 105.00 94.7500
2017-11-03 81.80 16.22 235.18 46.34 69.80 115.42 47.09 33.39 31.22 59.19 ... 58.58 71.88 33.66 58.83 68.68 26.78 63.49 55.21 105.00 94.4400
2017-11-02 81.73 16.12 234.96 46.58 69.91 116.15 47.31 33.50 31.43 59.05 ... 58.69 71.89 33.71 58.86 68.48 26.89 62.99 55.01 105.04 94.6299

5 rows × 39 columns

df_shape = raw_asset_prices_df.shape
print(f"There are {df_shape[0]} rows and {df_shape[1]} columns in the dataset")
print(
    (
        f"Data time-period covers: {min(raw_asset_prices_df.index)} "
        f"to {max(raw_asset_prices_df.index)}"
    )
)
There are 1013 rows and 39 columns in the dataset
Data time-period covers: 2013-11-01 to 2017-11-08

Note: We can see that the csv file of asset prices contains 39 different assets and 1013 records for each asset. The time-series is in reverse order (newest to oldest) and there are no missing datapoints in the dataset.

Calculate asset price correlations

Convert to log returns

Before calculating the correlation matrix, it is important to first normalise the dataset and convert the absolute asset prices into daily returns. In financial time-series it is common to make this transformation as investors are typically interested in returns on assets rather than their absolute prices. By normalising the data it allows us to compare the expected returns of two assets more easily.

log_returns_df = pd.DataFrame()

# calculate log returns of each asset
# loop through each column in dataframe and and calculate the daily log returns
# add log returns column to new a dataframe
for col in raw_asset_prices_df.columns:
    # dates are given in reverse order so need to set diff to -1.
    log_returns_df[col] = np.log(raw_asset_prices_df[col]).diff(-1)

# check output of log returns dataframe
log_returns_df.head()
Bonds Global Commodities DOW Emerg Markets EAFE Emerg Markets Bonds Pacifix ex Japan Germany Italy Japan ... Europe Pacific VXX Materials Energy Finance Tech Utilities ST Corp Bond CHF
Date
2017-11-08 -0.000733 -0.001828 0.000170 0.004714 0.003297 -0.000436 0.009904 0.003321 -0.004513 0.006184 ... 0.000516 0.007864 0.000298 0.001023 -0.004858 -0.004940 0.005483 0.000718 -0.000476 -0.000317
2017-11-07 0.000366 -0.006068 0.000042 -0.006423 -0.003727 -0.005306 0.000424 -0.008131 -0.004173 0.007911 ... -0.008559 0.003052 0.005384 0.001024 -0.001282 -0.013928 0.000471 0.011929 0.000095 -0.002219
2017-11-06 0.000733 0.018932 0.000977 0.011159 0.001432 -0.001387 0.002333 -0.001499 0.000000 -0.000169 ... 0.001535 0.001390 -0.009552 -0.004259 0.022602 -0.001121 0.002203 -0.003811 0.000000 0.003277
2017-11-03 0.000856 0.006184 0.000936 -0.005166 -0.001575 -0.006305 -0.004661 -0.003289 -0.006704 0.002368 ... -0.001876 -0.000139 -0.001484 -0.000510 0.002916 -0.004099 0.007906 0.003629 -0.000381 -0.002009
2017-11-02 0.000979 0.006223 0.003283 0.001289 0.003008 0.002759 0.005723 0.004488 0.009591 0.001186 ... 0.002559 0.002089 -0.011210 -0.007279 -0.002916 0.009341 0.000476 0.003642 0.000000 0.004553

5 rows × 39 columns

Calculate correlations matrix

To calculate the pairwise correlations between assets we can simply use the inbuilt pandas corr() function.

correlation_matrix = log_returns_df.corr()

# show first five rows of the correlation matrix
correlation_matrix.head()
Bonds Global Commodities DOW Emerg Markets EAFE Emerg Markets Bonds Pacifix ex Japan Germany Italy Japan ... Europe Pacific VXX Materials Energy Finance Tech Utilities ST Corp Bond CHF
Bonds Global 1.000000 -0.086234 -0.279161 -0.069623 -0.177521 0.296679 -0.104739 -0.188547 -0.201014 -0.159231 ... -0.181249 -0.141707 0.222886 -0.220603 -0.198536 -0.420264 -0.197672 0.301774 0.598105 0.250142
Commodities -0.086234 1.000000 0.305137 0.428909 0.369637 0.313791 0.400316 0.283711 0.331083 0.250509 ... 0.367019 0.341089 -0.224835 0.429541 0.677430 0.257454 0.225463 0.089290 0.028872 0.042324
DOW -0.279161 0.305137 1.000000 0.719260 0.793387 0.343817 0.688344 0.722882 0.666315 0.691672 ... 0.765251 0.769604 -0.797538 0.817422 0.655544 0.874902 0.849271 0.395247 -0.042967 -0.172730
Emerg Markets -0.069623 0.428909 0.719260 1.000000 0.796425 0.580454 0.815049 0.697222 0.660268 0.638955 ... 0.763111 0.811155 -0.668680 0.674630 0.607172 0.606749 0.686458 0.365031 0.123624 -0.018461
EAFE -0.177521 0.369637 0.793387 0.796425 1.000000 0.466883 0.790699 0.884385 0.830052 0.785198 ... 0.949368 0.872972 -0.721559 0.720030 0.610561 0.711714 0.725415 0.352270 0.050121 0.021760

5 rows × 39 columns

Correlations Heatmap

The conventional way to visualise correlations is via a heatmap. Before developing the network visualisation, we will quickly create a heatmap of the correlation matrix to check the output of the correlation calculations and to gain a high level insight into some of the relationships present in the data.

Seaborn has a very useful function called clustermap which visualises the matrix as a heatmap but also clusters the ETFs so that ETFs which behave similarly are next to each other. Clustered heatmaps can be a useful way of visualising correlations between attributes in a dataset, especially if the data is highly dimensional as it automatically reorders attributes which are similar to each other into clusters. This makes the heatmap more structured and readable so it is easier to identify relationships between ETFs and which asset classes behave similarly.

display(HTML("<h3>Clustered Heatmap: Correlations between asset price returns</h3>"))
sns.clustermap(correlation_matrix, cmap="RdYlGn")
plt.show()

Clustered Heatmap: Correlations between asset price returns

The heatmap is colour coded using a divergent colourscale where strong positive correlations (correlation = 1) are dark green, uncorrelated assets are yellow (correlation = 0) and negatively correlated assets are red (correlation = -1).

The clustered heatmap visualisation already gives a good picture of the data and tells an interesting story:

  1. Broadly speaking, there are two major clusters of assets. These appear to be separated into equities and "non-equity" assets (e.g. bonds, currencies and precious metals). The heatmap shows that these two categories are generally negatively or non-correlated with each other which is expected as 'safe haven' assets such as bonds, gold and currencies like the Japanese Yen tend to move in opposite directions to equities which are seen as a riskier asset class.
  2. ETFs tracking Geographic regions which are close to each other are highly correlated with each other. For example, UK, Europe, Germany, Italy and France ETFs are highly correlated with each other, so are Japan, Pacific ex Japan and China ETFs
  3. The VXX ETF is strongly negatively correlated with equities.

Network Visualisations

Heatmaps are useful, however, they can only convey one dimension of information (the magnitude of the correlation between two assets). As an investor wanting to make a decision on which asset classes to invest in, a heatmap still does not help answer important questions such as what the annualized returns and volatility of an asset class is.

We can use network graphs to investigate the initial findings from the heatmap further and visualise them in a more accessible way which encodes more information.

Networkx

Networkx is one of the most popular and useful Python libraries for analysing small/medium size networks.

In order to analyse the the correlations matrix as a network we first need to convert the correlations between assets to an edge list. This is a list containing information for each connection between each asset ETF in our data. This format requires the 'source' node (ETF), the 'target' node and the 'weight' (correlation) of the link between the two.

Create edge list

edges = correlation_matrix.stack().reset_index()
edges.columns = ["asset_1", "asset_2", "correlation"]

# remove self correlations
edges = edges.loc[edges["asset_1"] != edges["asset_2"]].copy()

# show the first 5 rows of the edge list dataframe.
edges.head()
asset_1 asset_2 correlation
1 Bonds Global Commodities -0.086234
2 Bonds Global DOW -0.279161
3 Bonds Global Emerg Markets -0.069623
4 Bonds Global EAFE -0.177521
5 Bonds Global Emerg Markets Bonds 0.296679

Create graph from edge list

Now that we have an edge list we need to feed that into the networkx library to create a graph. Note that this network is undirected as the correlation between assets is the same in both directions.

G0 = nx.from_pandas_edgelist(edges, "asset_1", "asset_2", edge_attr=["correlation"])

# print out the graph info
# check number of nodes and degrees are as expected
# (all should have degree = 38, i.e. average degree = 38)
print(nx.info(G0))
Graph with 39 nodes and 741 edges

Visualise the network

To visualise the graph we have just created, we can use a number of 'out-of-the-box' layouts which can be drawn using networkx, for example:

  • circular_layout - Position nodes on a circle.
  • random_layout - Position nodes uniformly at random in the unit square.
  • spectral_layout - Position nodes using the eigenvectors of the graph Laplacian.
  • spring_layout - Position nodes using Fruchterman-Reingold force-directed algorithm.
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(20, 20))

# save different layout functions in a list
layouts = [nx.circular_layout, nx.random_layout, nx.spring_layout, nx.spectral_layout]

# plot each different layout
for layout, ax in zip(layouts, axs.ravel()):
    nx.draw(
        G0,
        with_labels=True,
        node_size=700,
        node_color="#e1575c",
        edge_color="#363847",
        pos=layout(G0),
        ax=ax,
    )
    ax.set_title(layout.__name__, fontsize=20, fontweight="bold")

plt.show()

Note: Whilst these visualisations may look cool, they are not very useful in their current form as all nodes have the same number of edges (links), and each edge looks the same so no useful information about the correlations between ETFs can be gained.

Improved network visualisation

The network visualisation can be improved in a number of ways by thinking about the sort of information which we are looking to uncover from this analysis. It is important to think about what questions we wish to answer from the data and then design the visualisation to gain the most insight for the question in hand. For the purposes of enhancing the visualisation, We will assume that the audience for this visualisation will be investors wanting to assess risk in their portfolio. Investors would want to identify which assets are correlated and uncorrelated with each other in order to assess the unsystematic risk in their portfolio. Therefore, from this visualisation the user would want to quickly understand:

  • which assets show strong/meaningful correlations (i.e. >0.5) with each other
  • are these correlations positive or negative
  • which are the most/least 'connected' assets. (i.e. which assets share the most/least strong correlations with other assets in the dataset)
  • which groups of assets behave similarly (i.e. which assets are correlated with the same type of other assets)

With this information, an investor could identify if they held a number of assets which behave the same (increased risk) and identify assets which show very few correlations with assets currently held in the portfolio and investigate these as a potential opportunity for diversification.

For a first iteration to improve the network visualisation we can take the circular layout graph from above and make the following changes:

  • reduce the number of connections between nodes by adding a threshold value for the strength of correlation
  • introduce colour to signify positive or negative correlations
  • scale the edge thickness to indicate the magnitude of correlation
  • scale the size of nodes to indicate which assets have the greatest number of strong correlations with the rest of the assets in the dataset
Remove edges below a threshold
# edges from the diagram
threshold = 0.5

# create a new graph from edge list
Gx = nx.from_pandas_edgelist(edges, "asset_1", "asset_2", edge_attr=["correlation"])

# list to store edges to remove
remove = []
# loop through edges in Gx and find correlations which are below the threshold
for asset_1, asset_2 in Gx.edges():
    corr = Gx[asset_1][asset_2]["correlation"]
    # add to remove node list if abs(corr) < threshold
    if abs(corr) < threshold:
        remove.append((asset_1, asset_2))

# remove edges contained in the remove list
Gx.remove_edges_from(remove)

print(str(len(remove)) + " edges removed")
530 edges removed
Create colour, edge thickness and node size features
def assign_colour(correlation):
    if correlation <= 0:
        return "#ffa09b"  # red
    else:
        return "#9eccb7"  # green


def assign_thickness(correlation, benchmark_thickness=2, scaling_factor=3):
    return benchmark_thickness * abs(correlation) ** scaling_factor


def assign_node_size(degree, scaling_factor=50):
    return degree * scaling_factor


# assign colours to edges depending on positive or negative correlation
# assign edge thickness depending on magnitude of correlation
edge_colours = []
edge_width = []
for key, value in nx.get_edge_attributes(Gx, "correlation").items():
    edge_colours.append(assign_colour(value))
    edge_width.append(assign_thickness(value))

# assign node size depending on number of connections (degree)
node_size = []
for key, value in dict(Gx.degree).items():
    node_size.append(assign_node_size(value))
Draw improved graph
sns.set(rc={"figure.figsize": (9, 9)})
font_dict = {"fontsize": 18}

nx.draw(
    Gx,
    pos=nx.circular_layout(Gx),
    with_labels=True,
    node_size=node_size,
    node_color="#e1575c",
    edge_color=edge_colours,
    width=edge_width,
)
plt.title("Asset price correlations", fontdict=font_dict)
plt.show()

The network visualisation has been improved in four main ways:

  • less cluttered: we have removed edges with weak correlations and kept only the edges with significant (actionable) correlations
  • identify type of correlation: simple intuitive colour scheme to show positive (green) or negative (red) correlations
  • identify strength of correlation: we now are able to assess the relative strength of correlations between nodes, with prominence given to the correlations with the greatest magnitude
  • identify the most connected nodes: the size of the nodes has been adjusted to emphasise which nodes have the greatest number of strong correlations with other nodes in the network

Looking at this network we could now quickly identify which assets are highly correlated and therefore may pose increased risk on the portfolio. The viewer could also identify and investigate further the assets with low degree of connectivity (smaller node size) as these are weakly correlated with the other assets in the sample and may provide an opportunity to diversify the portfolio.

The circular layout, however, does not group the nodes in a meaningful order, it just orders the nodes in the order in which they were created, therefore it is difficult to gain insight as to which assets are most similar to each other in terms of correlations to other nodes. We can improve this with a spring based layout using the 'Fruchterman-Reingold' algorithm [2] which sets the positions of the nodes using a cost function which minimises the distances between strongly correlated nodes. This algorithm will therefore cluster the nodes which are strongly correlated with each other allowing the viewer to quickly identify groups of assets with similar properties.

nx.draw(
    Gx,
    pos=nx.fruchterman_reingold_layout(Gx),
    with_labels=True,
    node_size=node_size,
    node_color="#e1575c",
    edge_color=edge_colours,
    width=edge_width,
)
plt.title("Asset price correlations - Fruchterman-Reingold layout", fontdict=font_dict)
plt.show()

The Fruchterman Reingold layout has successfully grouped the assets into clusters of strongly correlated assets. As seen before in the heatmap visualisation, there are distinct clusters of assets which behave similarly to each other. There is a cluster containing bond ETFs, a cluster of precious metal ETFs (silver, gold, goldminers), a cluster of currencies (CHF, USD, Euro) and a large cluster for equities.

However, the main cluster of equities ETFs is still very cluttered as the nodes are very tightly packed and the node sizes and labels are overlapping making it difficult to make out.

As the layout now positions the nodes which are strongly correlated in space, it is no longer necessary to keep every single edge as it is implied that assets closer to each other in space are more strongly correlated. We can also convert the node sizes back to a consistent (smaller) size as the degree of each node is now meaningless.

Minimum spanning tree

It is common in financial networks to use a minimum spanning tree [3,4,5] to visualise networks. A minimum spanning tree reduces the edges down to a subset of edges which connects all the nodes together, without any cycles and with the minimum possible sum of edge weights (correlation value in this case). This essentially provides a skeleton of the graph, minimising the number of edges and reducing the clutter in the network graph.

Kruskal's algorithm is used to calculate the minimum spanning tree and is fairly intuitive. However, Networkx has an inbuilt function which calculates the minimum spanning tree for us.

# (after small correlations have been removed)
mst = nx.minimum_spanning_tree(Gx)

edge_colours = []

# assign edge colours
for key, value in nx.get_edge_attributes(mst, "correlation").items():
    edge_colours.append(assign_colour(value))


# draw minimum spanning tree. Set node size and width to constant
nx.draw(
    mst,
    with_labels=True,
    pos=nx.fruchterman_reingold_layout(mst),
    node_size=200,
    node_color="#e1575c",
    edge_color=edge_colours,
    width=1.2,
)

# set title
plt.title("Asset price correlations - Minimum Spanning Tree", fontdict=font_dict)
plt.show()

The improved graph has made the clusters of nodes more readable by reducing the node size and reducing the number of edges in the graph. However, reducing the clutter was at the expense of conveying some information about the nodes such as nodes with the most strong correlations and their relative strengths.

Interactive visualisation

Finally, I will explore the use of interactive plotting libraries to enhance the user experience and alleviate some of the missing information issues with the use of informative tooltips.

Currently the network is very 'static' and does not convey any information other than spacial relationships between strongly correlated assets. Although useful, this logically leads the viewer to ask more questions about the properties of each individual node in a cluster such as their historical returns and the nodes with which it is least correlated to.

One way to include such information about each node is to use an interactive graph with informative tooltips. This can be achieved using the Plotly[6] python library which offers a python api to create interactive javascript graphs built in d3.js. The following code uses the Plotly api to create a graph with the following features:

  • interactivity, such as zoom, to focus on clusters of nodes
  • tooltips which provides useful information about the node when the user hovers over it
  • node sizes proportional to the annualised returns of each asset
  • node colours to show asset positive or negative return over the dataset timeframe

Plotly

The Python code required for Ploty graphs can be a little bit more involved than out of the box matplotlib or seaborn charts but allows for much more customisation.

The main benefit of using Plotly in this example is the use of tooltips. In these tooltips we can store lots of additional information about each ETF which we can access by hovering over the respective node in the graph.

There are obviously many different bits of information of use which we could incorporate into the tooltip. For this example I will add:

  • annualised returns of the asset class
  • annualised volatility of the asset class
  • the top and bottom 3 assets which the asset of interest is most/least strongly correlated with|
Functions to get node information to populate tooltips

The following functions calculate the above quantities. It should be noted that Plotly tooltips are formatted using html. Therefore the input to the tooltip should be a string with the relevant html tags for any formatting that is required.

def convert_rankings_to_string(ranking):
    """
    Concatenate list of nodes and correlations into a single html
    string (required format for the plotly tooltip)

    Inserts html "<br>" inbetween each item in order to add a new
    line in the tooltip
    """
    s = ""
    for r in ranking:
        s += r + "<br>"
    return s


def calculate_stats(returns=log_returns_df):
    """calculate annualised returns and volatility for all ETFs

    Returns:
        tuple: Outputs the annualised volatility and returns as a list of
            floats (for use in assigning node colours and sizes) and also
            as a lists of formatted strings to be used in the tool tips.
    """

    # log returns are additive, 252 trading days
    annualized_returns = list(np.mean(returns) * 252 * 100)

    annualized_volatility = [
        np.std(returns[col] * 100) * (252 ** 0.5) for col in list(returns.columns)
    ]

    # create string for tooltip
    annualized_volatility_2dp = [
        "Annualized Volatility: " "%.1f" % r + "%" for r in annualized_volatility
    ]
    annualized_returns_2dp = [
        "Annualized Returns: " "%.1f" % r + "%" for r in annualized_returns
    ]

    return (
        annualized_volatility,
        annualized_returns,
        annualized_volatility_2dp,
        annualized_returns_2dp,
    )


def get_top_and_bottom_three(df=correlation_matrix):
    """
    get a list of the top 3 and bottom 3 most/least correlated assests
    for each node.

    Args:
        df (pd.DataFrame): pandas correlation matrix

    Returns:
        top_3_list (list): list of lists containing the top 3 correlations
            (name and value)
        bottom_3_list (list): list of lists containing the bottom three
            correlations (name and value)
    """

    top_3_list = []
    bottom_3_list = []

    for col in df.columns:

        # exclude self correlation #reverse order of the list returned
        top_3 = list(np.argsort(abs(df[col]))[-4:-1][::-1])
        # bottom 3 list is returned in correct order
        bottom_3 = list(np.argsort(abs(df[col]))[:3])

        # get column index
        col_index = df.columns.get_loc(col)

        # find values based on index locations
        top_3_values = [df.index[x] + ": %.2f" % df.iloc[x, col_index] for x in top_3]
        bottom_3_values = [
            df.index[x] + ": %.2f" % df.iloc[x, col_index] for x in bottom_3
        ]

        top_3_list.append(convert_rankings_to_string(top_3_values))
        bottom_3_list.append(convert_rankings_to_string(bottom_3_values))

    return top_3_list, bottom_3_list
Function to generate coordinates

Plotly does not have an 'out-of-the-box' network graph chart, therefore we need to 'imitate' the network layout by plotting the data as a scatter plot which plots the graph nodes, and plot a 'line' chart on top which draws the lines which connect each point. To achieve this we need a function which converts the Fruchterman Reingold coordinates calculated using networtx into an x and y series to create the scatter plot. We also need to store the x and y coordinates of the start and end of each edge which will be used to draw the 'line' chart.

The function calculates the x and y coordinates for the scatter plot (Xnodes and Ynodes) and the coordinates of the starting and ending positions of the lines connecting nodes (Xedges, Yedges):

def get_coordinates(G=mst):
    """Returns the positions of nodes and edges in a format
    for Plotly to draw the network
    """
    # get list of node positions
    pos = nx.fruchterman_reingold_layout(mst)

    Xnodes = [pos[n][0] for n in mst.nodes()]
    Ynodes = [pos[n][1] for n in mst.nodes()]

    Xedges = []
    Yedges = []
    for e in mst.edges():
        # x coordinates of the nodes defining the edge e
        Xedges.extend([pos[e[0]][0], pos[e[1]][0], None])
        Yedges.extend([pos[e[0]][1], pos[e[1]][1], None])

    return Xnodes, Ynodes, Xedges, Yedges

Plotly Graph Code

Now we can finally plot the network as an interactive Plotly visualisation.

First, we need to calculate all the quantities and concatenate them into a html string to be used as an input for the tooltip. Then we need to calculate the coordinates for the scatter and line plots and finally define the node color and size which depend on the direction and size of the annualised returns.

# make list of node labels.
node_label = list(mst.nodes())
# calculate annualised returns, annualised volatility and round to 2dp
annual_vol, annual_ret, annual_vol_2dp, annual_ret_2dp = calculate_stats()
# get top and bottom 3 correlations for each node
top_3_corrs, bottom_3_corrs = get_top_and_bottom_three()


# create tooltip string by concatenating statistics
description = [
    f"<b>{node}</b>"
    + "<br>"
    + annual_ret_2dp[index]
    + "<br>"
    + annual_vol_2dp[index]
    + "<br><br>Strongest correlations with: "
    + "<br>"
    + top_3_corrs[index]
    + "<br>Weakest correlations with: "
    "<br>" + bottom_3_corrs[index]
    for index, node in enumerate(node_label)
]

# get coordinates for nodes and edges
Xnodes, Ynodes, Xedges, Yedges = get_coordinates()

# assign node colour depending on positive or negative annualised returns
node_colour = [assign_colour(i) for i in annual_ret]

# assign node size based on annualised returns size (scaled by a factor)
node_size = [abs(x) ** 0.5 * 5 for x in annual_ret]

To plot the network we define the scatter plot (tracer) and line plot (tracer_marker) series and define some cosmetic parameters which dictate the graph layout.

# edges
tracer = go.Scatter(
    x=Xedges,
    y=Yedges,
    mode="lines",
    line=dict(color="#DCDCDC", width=1),
    hoverinfo="none",
    showlegend=False,
)


# nodes
tracer_marker = go.Scatter(
    x=Xnodes,
    y=Ynodes,
    mode="markers+text",
    textposition="top center",
    marker=dict(size=node_size, line=dict(width=1), color=node_colour),
    hoverinfo="text",
    hovertext=description,
    text=node_label,
    textfont=dict(size=7),
    showlegend=False,
)


axis_style = dict(
    title="",
    titlefont=dict(size=20),
    showgrid=False,
    zeroline=False,
    showline=False,
    ticks="",
    showticklabels=False,
)


layout = dict(
    title="Plotly - interactive minimum spanning tree",
    width=800,
    height=800,
    autosize=False,
    showlegend=False,
    xaxis=axis_style,
    yaxis=axis_style,
    hovermode="closest",
    plot_bgcolor="#fff",
)


fig = go.Figure()
fig.add_trace(tracer)
fig.add_trace(tracer_marker)
fig.update_layout(layout)

fig.show()

display(
    HTML(
        """
        <p>Node sizes are proportional to the size of
        annualised returns.<br>Node colours signify positive
        or negative returns since beginning of the timeframe.</p>
        """
    )
)

Node sizes are proportional to the size of annualised returns.
Node colours signify positive or negative returns since beginning of the timeframe.

With this final network layout, we have summarised the most important information about the relationships between assets in the dataset. Users can quickly identify clusters of assets which are strongly correlated with each other, gauge the performance of the asset over the timeframe by looking at the node sizes and colours and can get more detailed information about each node by hovering over it.

Conclusion

In this post we have shown how to visualise a network of asset price correlations using the networkx python library. The 'out of the box' networkx visualisations were iteratively improved to help the audience identify assets which behave similarly to other assets in the dataset. This was achieved by reducing redundant information in the network and using intuitive colour schemes, edge thickness and node thickness to convey information in a qualitative way. The functionality of the visualisation was further improved by the use of Plotly, an interactive graphing library, which allowed us to use tooltips to store more information about each node in the network without cluttering the visualisation.

Future Work

Further work could look at the rolling asset correlations over a shorter timeframe and see how these may have changed over time and how this then affects the network layout and clusters. This could help identify outliers or asset classes which are not behaving as 'normal'. There are also many more interesting calculations which can be carried out on the network, such as working out which are the most important or influential nodes in the networks by using centrality measures [7].

The post focused on how to visualise the network using networkx and Plotly, however, there are many other libraries and software which could have been used and are worth investigating:

Open Source Python Libraries

  • Graphviz
  • Pygraphviz

Open Source Interactive Libraries

  • d3.js

Network drawing Software

If you are interested in investigating network analytics in the financial industry in more detail, I would highly recommend checking out https://www.fnalab.com/ which is a commercial platform for financial network analytics. They have a free trial to their platform which has some very impressive network visualisations of the stock market but also financial transcations for fraud detection and many more applications.

References

[1] Networkx documentation: https://networkx.github.io/documentation/stable/reference/drawing.html
[2] Fruchterman, T.M. and Reingold, E.M., 1991. Graph drawing by force-directed placement. 1991. Zitiert auf den, p.37.
[3] Mantegna, R.N., 1999. Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex Systems, 11(1), pp.193-197.
[4] Rešovský, M., Horváth, D., Gazda, V. and Siničáková, M., 2013. Minimum Spanning Tree Application in the Currency Market. Biatec, 21(7), pp.21-23.
[5] Financial Network Analytics: https://www.fna.fi/
[6] Plotly: https://plot.ly/d3-js-for-python-and-pandas-charts/
[7] Wenyue Sun, Chuan Tian, Guang Yang, 2015, Network Analysis of the Stock Market